and H32. The NOEs clearly define the rientation of the ligand 
peptide on the WW domain. 

The pr line-containing part of peptide GTPPPPYTVG was 
restrained into a p lyproline helix type II, and the structure of the 
complex was calculated using a standard simulated annealing 
protocol 15 (X-PLOR lft ) and the ten interm lecular NOEs as 
constraints. The result shows smooth contacts between the 
ligand and the domain (Fig. 3). The central prolines P4' and P5' 
contact W39, and the carbonyl group of P6' points towards the OH 
group of the conserved residue Y28. The peptide tyrosyl residue 
YT is accommodated in a hydrophobic pocket formed by L30 and 
H32. These contacts are well defined by six NOEs between the 
aromatic ring of Y7 and the side chains of L30 and H32. YT could 
form a hydrogen bond to the histidine ring but also to Q35, whose 
chemical shifts change strongly on peptide binding (Fig. 1). 

The aromatic residues at positions 39 and 28, a hydrophobic 
residue at position 30 and a histidine at position 32 all tend to be 
conserved (Fig. 1 ). Our structure shows that these are the residues 
that are in contact with the peptide. The importance of this 
hydrophobic surface is underscored by the low binding affinity 
of mutants H32A, L30K and Q35A to both peptides (except in one 
case, see Table 1; E.B. eta!., unpublished results). The structure of 
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the mutants remained intact, as judged by their tvw-dimensipnal 
NMR spectra. The binding affinities of peptides in which the 
tyrosine residue of SPPPPY^V is substituted with alanine, leucine 
or phenylalanine show that the YAP d main is specific f r a 
tyrosine-containing motif. In the first two cases there is no binding, 
and the peptide containing phenylalanine has nly a weak affinity 
(Table 1). Thus, the tyrosine in the PPxYm tif may be needed for 
the interaction of ligandswith aset of WW domains. The importance 
of the polyproline motif is shown by the lack of binding when the 
two centre prolines in this peptide are: replaced by alanines, 
whereas replacement of the first proline has no effect (Table 1). 

Our structure confirms the hypothesis 4 that the WW domain is a 
binding module for proline-rich ligands. The PPxY motif may not 
be the only ligand for WW domains: for instance, it is not present 
in the proline-rich tail of formin that interacts with SH3 and WW 
domains 10 , where sequence motifs such as PPxLP are found. The 
structure indicates that hydrophobic residues replacing tyrosine of 
the ligand could be accommodated on the surface of those 
domains that contain hydrophobic residues other than leucine at 
position 30. 

Note added in proof: An involvement of hYAP in retroviral 
budding through its WW domain was recently suggested 27 . □ 
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Pdz domains (also known as DHR domains or GLGF repeats) are 
-90-residue repeats found in a number of proteins implicated in 
ion-channel and receptor clustering, and the linking of receptors 
to effector enzymes 1 . PDZ domains are protein-recognition 
modules; some recognize proteins containing the consensus 
carbon-terminal tripeptide motif S/TXV with high specificity 2-4 . 
Other PDZ domains form h m rypicdimers: the PDZ domain of 
the neuronal enzyme nitric oxide synthase binds t the PDZ 
domain f PSD-95, an interaction that has been implicated in its 
synaptic association 9 . Here we report the crystal structure f the 
third PDZ domain f the human bomologue f the Drosophila 
discs-large tumour-suppressor gene product, DlgA. It consists of a 
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five-stranded antiparallel 0-barrel flanked by three a-helice& A 
groove runs over the surface of the domain, ending in a conserved 
hydrophobic pocket and a buried arginine; we suggest that this is 
the binding site for the C-terminal peptide. 

The PDZ domain 6 is named after three of the proteins in which 
the repeats have been described: PSD-95 (postsynaptic density 
protein. A/, 95K). Dig (discs-large protein) and ZO-1 (zonula 
occludens-1 ). These proteins have a conserved structure compris- 
ing three tandem PDZ domains, an SH3 domain and a guanylate- 
kinase-like domain. Other proteins that contain PDZ domains 
include certain protein kinases and protein tyrosine phosphatases, 
and neuronal nitric oxide (NO) synthase'. These domains appear 
to be protein-recognition modules analogous to the well charac- 
terized SH2 and SH3 domains. 

We crystallized a recombinant form of the third PDZ domain 
(PDZ-3) from human Dig and solved its structure at 2.8 A 
resolution using two heavy-atom derivatives and solvent flattening 
(Table 1 ). A complete mode! for the 96-residue domain has been 
built; in spiv of a large proportion of glycine residues (13%), all of 
the secondary structure elements and connecting loops are well 
ordered. The domain is compact and globular, with a diameter of 
25-30 A. P-strands 2-5 form an up-and-down ^-barrel and strand 
PI crosses over the barrel and hydrogen-bonds to p5; a sh n a- 
helix (atl ) and its connecting loop cap one end of the barrel; helix 
or2 caps the other end of the barrel, and a C-terminal helix (u3) 
packs against the outside of the barrel (Figs. 16 and 2). Most f the 
c nserved residues (Fig. 1a) are hydrophobic, and f rm the core 
of the domain, which is exposed on one face (Figs 2, 36). ihere are 
two exceptions: a conserved aspartic acid (D510), which is buried 
and forms a salt bridge to an arginine (R465); and an asparagine 
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(N516) in the fJ4-a2 loop wh se side chain packs against the ot2- 
P5 loop. Inserti ns and deletions in other members of the PDZ 
family are restricted to the c nnecting loops between secondary 
structure elements. One major insertion of six residues occurs in 
PDZ-1 and PDZ-2 of hDLG and PSD-95, which maps to the loop 
between strands p2 and P3. A well-ordered helix, x3, extends 
seven residues beyond the C terminus of the published consensus 
sequence for the PDZ family, and may well be present in all 
members of the family. If this is the case, ho or very few residues 
link the C-terminal helix of PDZ domain 1 (PDZ-1) to the first 
strand of PDZ-2, consistent with the finding that a recombinant 
fragment containing both PDZ-1 and PDZ-2 forms a single 
protease-resistant module (S.M.M. and A.H.C., manuscript in 
preparation). 

Several PDZ domains (the closely related second domains of 
Dig (ref. 4) and PSD-95 (refs 2, 3) and one domain of PTP-BAS 
(ref. 7)) recognize peptides containing a consensus C-terminal 
T/SXV sequence (in single letter amino-acid code, where X is 



any residue) with high specificity: mutation of the threonine t 
alanine or of the terminal valine to either alanine or aspartate 
abolishes binding". A large number f membrane-associated 
proteins, including many neuronal i n channels and synaptic 
receptors, share this terminal motif 2 " 1 . We used the program 
SURFNET to search for peptide-binding cavities n the surface 
of the domain, which uses only geometric criteria derived from the 
atomic coordinates. In known protein-ligand complexes, the 
program correctly predicts the ligand-binding site (corresponding 
to the largest cavity) for 85% of the cases (J. Thornton, personal 
communication). For the PDZ domain, the program finds only 
one cavity (volume, 720 A 3 ) that is large enough for ligahd binding 
(the volume of the second largest cavity is only 150$). The cavity 
includes a hydrophobic pocket and a groove that runs over the t p 
of the molecule (Fig. 3fl). The hydrophobic pocket is highly 
conserved among PDZ family members (Fig. 36) and is formed 
by the 01-P2 loop (which includes the conserved GLGF motif) 
and side chains from strand P2 and helix *2 (Figs 2, 3c). At the 
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S2?4fe S G LGFSLftG GDDSSIFmCllTGGAAAQ DGRLSVUDCIWVNEVD VROVTOSKAVSALKEA 

SS" IS5 G OV^QHI PGDKSIYVTKI IHOGAAHK DGKLQIGDXLLAVHNVC LEBVTHEEAVTALKNT 

2 fSS 1 * 6 OTDNPHI GTDTSiyrrKLISQGAAAA DGW*SIHDIIVSVIJDV£ WWPHASJLVOVLKKA 



G IGFNIVO CJKE GIFIJFIUQGPADL. SGBLWCOJRIISVNSVD LRMSHDQAAAALKKA GQAVTlVAOYRPEBySRFEA 



TTTK K 3PQ G LOPNIVG GEDGQ GIYVSPILAOGPADL GSBUCRQDOUSVmtVN 

TVTU P IPCP G PGXAXSC GRDKPHFQSGETSXVISDPIKGCPA KMLQKOTKVAHVHGVS 
KVTU I 5RKNBB . YGLRLAS H IPVKBISQDSLAAB DSNIOEGDVVLK229GTV 

LVXP1 < 3D. S VQLRIAG GNDV GIPVAGVLBDSPAAK EG LEEGDOILRVNNVD 

TVTW ! 5SKR G FGIAVSC GRDNPHFENGBTSIVISDVLPGGPA DGLbQSHDRWKVSGTP 
GVLU I 3UHEE VGLR1GS Q IFVKOTTRTCLATK DGHLHEGWILRIBGTV 

! 2 IKS* 0 <MDV GIFVAOICEOrrSAEQ EG LQEGDQILKVNTQD 

B1TLI ! 215 G LGPTIAG GTWPHI GDDPS XFZTKZ I PGGAAAQ DGRLRVHDSH.FWEVD 



RZVII 1ST C USPNZVC . GEDGB 

uori i mn p mhtuclnsco 

SVRU ( UCVG G USPLVKERV 
GVXVI I 2BLG G IAISZKG GKBNK 
KVTVJ I ID*G G I0ISXKG GRBKK 
LWU I 3AKY G LQFQIIG GEKMGRL 



GXPISPIlAvSGPADL SGBUUCCOQtLSVKGVD 
SCTVARZ LHGGMIKR QGSLHVGDBILfiXNGTO 
SKPPVI XSDLIftGGAAGQ SGLIOAGDIILAVBGKP 
HP I LI SKI FRGLAADQ TQALWQDULSVHGAD 
NPILISfCI FKGLAADQ TBALFVGDAILSVKGED 

DLQIFISSVAPGGPADL DGCLKPG08LISVKSV5 

„ -i4 T0 ^ ,5 SSHS^ GVNTSV * HG GIYVKAVIPQGAAES DGRIHKGERVLAVBCVS 
SVKU J ISSGLG FSPSRED NUPEQI NASIVRVKKtFFGQPAAB SGKIDVODVIUCVHGAS 
DITM IKE B I/jFSLCG GHDStY QVVYISDINPRSVAAI EGKLQLLDVZBWBGVS 
"TiSSIS S IS? 1 * 0,10111 GOfVHDVIQD PARS DC^CPGDRLIKVHDTD 

, 2S25SS I ISSiS 25S K kplwsrtkpespadtcipkihecdqivlibgrd 

LXRJ0CPD8NQ R FGFNVKG GYDQK KWIVSfWAPGTTPAD[^PRUJEGDOVVLIHCRD 

... tK^VTOPPjwppcarreHsmvBa 

K TCPORA I RVVKGDTDVYTVHWVWHVKDGOPAS EAGLRQGDLITH113GEP 



PTVIIAHOtKOGPAEK SGKLNIGDQIJISINGTS 
HC SWRQG IAER OGVRVGBRI 13 INGQS 
RRUmiSVKBTClASK RGLKAGDBZLSXKNAA 
DKLGIYVKSWXGGAADV DGRUAGOQLLSVDGRS 
GDGGIYVGSDtKGGAVAL OGR I BPGDMI LQVHDVN 

" wwwi GDKPUTINRI fkgaaseq setvqpgdeilclggta 

BTTCPEIQSYRPONQNACTSSEHFTLICKIQEDSPAH CAGLQAGDVLANIHGVS 
HB Q LOISITG GKBHG VPILISEIHPGQPADR C3GLHVGDAI IAVHGVW 

MuWTrSS!! f? 5 VKVtSEUCBCSPADA CTRIDAGDEILHKCRT 

NyjWGSgPVHKG MSPST VK RDER VIVBSVIVGSPArK A3 LLVGDTILS WGES 

WSL^PSN G PGPTVTGRETAKG ERLFVIGTVKPYGVAL GHUCSGDRLLEIMGTP 
VIPFIHGSSSAG I/3VS LKAR VS KKSNG SX VDCG I F IXNVMHGCiAAFK EGGLRVDDRIVGVFDID 
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ISKHTODOWWFIKASRESHSRBLALVaRR 
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IRNVPLDE XDLLIQET SRLLQLTLBUDS 
VHGLVHTEWELVLKS ONKVSISTTPLE 
LVGLPLSTC0SIXK6L BNQSRVKU>XVR 
WATPHEKIVHIUSNA VGEUfiOCWPAA 
AGTUiSSNUEDFLSQP Sl^GLLVRTYPE 
LVGLSQERAAEUCTRT SSVVTLEVAKQG 
FEWITODEAVRVLREV VQKPGPIKLWA 
M0GLTOFEAIC4I IKAL PDGPVTIVIRRK 
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KSDXYQ SGVTRI LKEA ARVGEADVLILR 
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FIG. 1 a, Sequence alignment of the PDZ 
domain family, with secondary structure indi- 
cated tor the PDZ-3 domain of hD(g (DLG_h3). 
Residue numbering at the top is that of the full- 
length hDIg protein. Positions where the 
chemical character of residues is conserved in 
90% of sequences are highlighted in yellow. 
The four residues forming the hydrophobic 
pocket are marked by asterisks. The sequences 
of domainS'PDZ-1 and PDZ-2 from hDIg are 
contiguous. The sequences are from various > 
species (h, human, r, rat; m, mouse; d, 
Drosophiia; ce, C etegans). The alignment 
and the sequence numbering shown at the 
bottom were also taken from ref. 1, where a full 
explanation of the sequence abbreviations can 
be found. The largely conserved basic residue 
corresponding to R471 in hDIg is boxed. The 
principal residues involved in dimerization are 
T474, G475, N479. V481, E484. S492, F493, L495, A496, H525 and 
A530 and are indicated with a caret ("), o, Stereo C* plot of PDZ-3, with 
every tenth residue numbered and the N and C termini indicated. The fold- 
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search program DAU 19 specifies the p-subunit of Klebsiella aerogenes 
urease 20 as having a similar fold, classified by the SCOP database 21 as a 
'P-clip* fold, which is also shared by the ertfyme dUTPase 22 . 
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^ FIG. 2 Ribbon diagram of the PDZ-3 domain with secondary structure elements and N and C termini 

indicated, generated with MOLSCRIPT 23 , RASTER3D 2 ' and RENDER 25 . Side chains forming the hydro- 'M 
phobic pocket (residues Leu 476, Phe 478. He 480 and Leu 532), along with Arg471 t are also shown. '% 



FIG. 3 Surface of the PDZ-3 domain. The view 
shown is similar to that in Fig. 2. a, Space- 
filling model with the largest cavity found by 
SURFNET 6 , shown in gold. b. Space-filling 
model showing conserved residues. Residues 
in red are those that are highly conserved 
among PDZ domains. The side-chain nitrogens 
of Arg47i are shown in blue, c, Stereo 
surface-charge representation, generated 
with the program GRASP 26 . Regions with 
positive and negative electrostatic potential 
are shown in blue and red, respectively, d, 
Close-up of c, with the model tripeptide TDV 
shown in the hydrophobic pocket. 
METHODS. A selection of probes (-0H, 
C0 3 -, -CO, -CH 3 , -NH) was used to search 
the SURFNET* cavity for possible binding 
sites using GRID 27 . The tripeptide was then 
manually decked into the cavity using the con- 
tour maps from GRID as a guide. Weak har- 
monic restraints were then applied to enable 
the automatic generation by rnolecular dynamics 
and simulated annealing (MODELLER 28 ) of a 
set of 10 three-dimensional models of the 
tripeptide complexed with the PDZ domain. 
The model with the lowest energy was 
selected. 
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TABLE 1 Summary of crystallographic analysis 



Crystals . Native ^PtCL CH 3 HgNO, 

Resolution (A) 2.8 3.5 5 0 

Unique reflections 5,793 1,998 * 1014 

Completeness (%) 100 80.1 98 6 

Wm,^(%) (outer shell) 10.8(30.1) 8.6(12.6) 4.0(4 4) 

Redundancy 13 2.5 g 

**»(%) 9.6 5.6 

No. of sites 3 2 

Phasing power 1.5 0 .62 
Figure of merit, 0.34 (after DM 14 0.76) 

Refinement statistics 

Resolution No. of atoms R^^ R tm (%) bondteng^s (A) bond angle (deg) 

15-2.8A 727 22.0 27.1 0.008 1.4 



m = J?i (,> ~A /1{ ) '^ eTe J IS -? e , obsen/ed intensit y and <'> is intensity. = - FA/Z\F± where f„ is the heavy-atom 

ISTS^SS^ T T a Sf 5 1 th6 F** 1 " WS? 9 ***' ^ as ' ng ^ is (l^l>/(^). £ is the residual lac* ofclosure eror. Figure of merit is 
ffi&ffi * S l Ffl /^p for reflections withF, >2,F p .R tree is thesarheasR^, but 

calculated on the 10% of data excluded from refinement The domain was expressed as a glutathione-S-transferase (GST) fusion in £ colt strain BL21 The 
expressed protein includes residues 457 to 552 of the human homologue of Dmsophiia discs-targe protein 11 , and was purified on glutathione-Seoharose 
(Pharmacia) and eluted after protease digestion with thrombin. It was further purified on Mono-Q Sepharose (Pharmacia) . The crystals grow by hanriVdroD 
vapour diflysion at a protein concentration of 15mgml l from 0.9-1.3M sodium citrate, pH 6.5-7.5 at 4 C, and adopt space group P6,22 with 
tr^^^nr^^^ S° Itected If 3 Ri g ku ™?°° HB x -ray generator and image plate with Cu-Ka radiation. Ma' were 

processed using DENZO and SCAUPACK 12 . One platinum position was found from the difference Patterson map with the program RSPS° and further sites 
were derived from drference Founere after solvent flattening and histogram matching using the program DM" from the CCP4 program suite 15 . Heavy-atom 
^^^ ^ ^ "^VV 16 ^^ MLPHARE 1 . 5 . The high solvent content (70%) present in the crystal was a powerful constraint durinrphas^ 
.refinement and exter^pnand allowed the production of an easily iriterpretable electron-density map 17 . The refinement consisted of rounds of simulated 
a " n ®^ in 8? n 5f goupedB-fector refinement to 2.8 A with XpLOR 18 . The present model includes 96 residues, from residues 460 to 552!, plus trroe residu^s^ 
^efJterminiK 

the/Raniachandran plot The ^latiye molecular mass and ^ were determined by sedimentation equilibrium in a Beckman Optima XL-A analytical 
utewritrifuge using a i r\^uardt-Levenberg non-linear least-squares-fitting model. IDEAL!, and MULTVSELF, a modified Gauss-Newton fbur-exoonenL 
^^SbS??!?^ 1 indude< L ,n *? Beckman data ana ^ vsis software. % rotor speed was 30,000 r.p.m. and equilibrium solute drstriUrtiorteTachieved 
d 278 ^ re °° SeVen S0,ute concentrations in tne ran ge 0.02-1.0mgml \ with scanning ultraviolet optics at wavelengths of 220 



back of the pocket is a partially buried arginine, R471, from the 
pi-02 loop, which is held in a rigid conformation by hydrogen 
bonds to three main-chain carbonyl oxygens, and is reminiscent of 
the buried arginine involved in phosphotyrosine binding to SH2 
d mains 9 . A positively charged residue (arginine or lysine) is 
present in this position in all PDZ domains that are known to 
bind C-terminal peptides (and in ~85% of known PDZ domains). 
We modelled a consensus tripeptide, TDV, into the putative 
binding site, to see if the peptide could be accommodated into 
this cavity with good stereochemistry and complementarity of 
non-covalent interactions (Fig. 3d). The lowest-energy model 
meets these criteria, and orients the valine side chain into the 
hydrophobic pocket, with the terminal carboxyiate salt-bridging to 
the arginine. The aspartic acid side chain points out into solution; 
the threonine hydroxyl makes a hydrogen bond with the carbonyl 
group of a conserved glycine (G477). The shortest peptides that 
bind PDZ domains are nine residues long; residues upstream of 
the C-terminal tripeptide could be accommodated within the 
groove that extends over the top of the domain. The modelling 
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